Total Jensen divergences: Definition, Properties and k-Means++ Clustering
نویسندگان
چکیده
We present a novel class of divergences induced by a smooth convex function called total Jensendivergences. Those total Jensen divergences are invariant by construction to rotations, a feature yieldingregularization of ordinary Jensen divergences by a conformal factor. We analyze the relationships be-tween this novel class of total Jensen divergences and the recently introduced total Bregman divergences.We then proceed by defining the total Jensen centroids as average distortion minimizers, and study theirrobustness performance to outliers. Finally, we prove that the k-means++ initialization that bypassesexplicit centroid computations is good enough in practice to guarantee probabilistically a constantapproximation factor to the optimal k-means clustering.
منابع مشابه
A generalization of the Jensen divergence: The chord gap divergence
We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applications. We report an iterative concave-convex procedure for computing centroids, and analyze the perfo...
متن کاملClustering with Bregman Divergences: an Asymptotic Analysis
Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the lim...
متن کاملOn Clustering Histograms with k-Means by Using Mixed α-Divergences
Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retr...
متن کاملA Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)
Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...
متن کاملNon-flat Clusteringwhith Alpha-divergences
The scope of the well-known k-means algorithm has been broadly extended with some recent results: first, the kmeans++ initialization method gives some approximation guarantees; second, the Bregman k-means algorithm generalizes the classical algorithm to the large family of Bregman divergences. The Bregman seeding framework combines approximation guarantees with Bregman divergences. We present h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1309.7109 شماره
صفحات -
تاریخ انتشار 2013